Skip to content

Conversation

borfast
Copy link
Collaborator

@borfast borfast commented Aug 12, 2025

In the context of IN-524, we want to have the archived and excluded repositories for each project returned along with the project data.

This means adding the repository data to TinyBird, which already includes the archived and excluded flags in the segmentRepositories table, and then merge that data into the existing pipes that return project data, namely the projects_list and the search_collections_projects_repos pipes.

This PR:

  • Adds the database migration to ensure we get data from the segmentRepositories table into Sequin.
  • Adds a new segmentRepositories data source in TinyBird to receive data from the aforementioned table.
  • Modifies the insights_projects_populated_copy pipe to include the list of archived and excluded repositories, which is then propagated to the insights_projects_populated_ds data source, from there it feeds the insightsProjects_filtered pipe, and finally the projects_list pipe, which is the end goal.
    • The search_collections_projects_repos contains archived and excluded columns for the repositories results. This required adding a JOIN to the query, and with all the optimisations that were done not too long ago, I'm not sure if this

Specific tickets in Jira:

@borfast borfast requested a review from Copilot August 12, 2025 22:29
@borfast borfast self-assigned this Aug 12, 2025
@borfast borfast requested a review from epipav as a code owner August 12, 2025 22:29
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventional Commits FTW!

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the infrastructure to propagate archived and excluded repository data from the segmentRepositories table through the TinyBird pipeline to support enhanced project filtering and search capabilities.

  • Adds database migration to enable replication of the segmentRepositories table to TinyBird
  • Creates a new segmentRepositories data source in TinyBird and updates pipeline to aggregate archived/excluded repository lists
  • Propagates archived repositories data through the insights project pipeline to enable repository-aware search functionality

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
backend/src/database/migrations/V1753798345__segmentRepositories_replication.sql Enables replication of segmentRepositories table to TinyBird
services/libs/tinybird/datasources/segmentRepositories.datasource Defines TinyBird data source schema for segmentRepositories table
services/libs/tinybird/datasources/insights_projects_populated_ds.datasource Adds archived/excluded repositories fields to project data source
services/libs/tinybird/pipes/insights_projects_populated_copy.pipe Aggregates archived/excluded repositories and joins to project data
services/libs/tinybird/pipes/insightsProjects_filtered.pipe Passes through archived/excluded repositories to filtered results
services/libs/tinybird/pipes/search_collections_projects_repos.pipe Exposes archived repositories in search results

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@borfast borfast changed the title Sata pipeline changes for segmentRepositories [IN-552] feat: data pipeline changes for segmentRepositories [IN-552] [IN-554] Aug 12, 2025
@borfast borfast requested a review from themarolt as a code owner August 16, 2025 11:31
@borfast
Copy link
Collaborator Author

borfast commented Aug 27, 2025

Ping @epipav

@borfast borfast force-pushed the feature/in-524-data-pipeline-changes-for-segmentRepositories branch from eea8f5c to 4501d3e Compare August 28, 2025 00:31
@borfast borfast requested a review from epipav August 28, 2025 00:35
@joanagmaia
Copy link
Contributor

@borfast can you confirm if the PR is up to date with the latest comments? And if so, can you gather all the questions you need to align with the team and perhaps go over them on a quick sync

@borfast borfast force-pushed the feature/in-524-data-pipeline-changes-for-segmentRepositories branch from 4501d3e to 3a3d9af Compare August 28, 2025 15:43
borfast added 10 commits August 30, 2025 00:55
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
…olumns

Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
@borfast borfast force-pushed the feature/in-524-data-pipeline-changes-for-segmentRepositories branch from e78438a to f7f5eb7 Compare August 29, 2025 23:56
Copy link
Collaborator

@epipav epipav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only issue is the ENGINE_VER of the new datasource, other than that looks good 👍

Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
@borfast borfast added the --skip-regression-tests To skip regression tests in Tinybird CI label Sep 1, 2025
@borfast borfast requested a review from epipav September 1, 2025 15:30
@borfast borfast removed the --skip-regression-tests To skip regression tests in Tinybird CI label Sep 1, 2025
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
@borfast borfast force-pushed the feature/in-524-data-pipeline-changes-for-segmentRepositories branch from 83c54c6 to 1d7cd18 Compare September 1, 2025 17:15
Signed-off-by: Raúl Santos <4837+borfast@users.noreply.github.com>
@borfast borfast merged commit 1dde2d5 into main Sep 1, 2025
14 of 15 checks passed
@borfast borfast deleted the feature/in-524-data-pipeline-changes-for-segmentRepositories branch September 1, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants